Boosting under-resourced speech recognizers by exploiting out-of-language data - case study on Afrikaans

نویسندگان

  • David Imseng
  • Hervé Bourlard
  • Philip N. Garner
چکیده

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we boost the performance of an Afrikaans speech recognizer by using already available data from other languages. To successfully exploit available multilingual resources, we use posterior features, estimated by multilayer perceptrons that are trained on similar languages. For two different acoustic modeling techniques, Tandem and Kullback-Leibler divergence based HMMs, the proposed multilingual system yields more than 10% relative improvement compared to the corresponding monolingual systems only trained on Afrikaans.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using out-of-language data to improve an under-resourced speech recognizer

Under-resourced speech recognizers may benefit from data in languages other than the target language. In this paper, we report how to boost the performance of an Afrikaans automatic speech recognition system by using already available Dutch data. We successfully exploit available multilingual resources through (1) posterior features, estimated by multilayer perceptrons (MLP) and (2) subspace Ga...

متن کامل

Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation

Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic modelor feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic modeland featurelevel adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in ou...

متن کامل

Tper Hcaeser Pidi Application of Subspace Gaussian Mixture Models in Contrastive Acoustic Scenarios

This paper describes experimental results of applying Subspace Gaussian Mixture Models (SGMMs) in two completely diverse acoustic scenarios: (a) for Large Vocabulary Continuous Speech Recognition (LVCSR) task over (well-resourced) English meeting data and, (b) for acoustic modeling of underresourced Afrikaans telephone data. In both cases, the performance of SGMM models is compared with a conve...

متن کامل

Under-resourced speech recognition based on the speech manifold

Conventional acoustic modeling involves estimating many parameters to effectively model feature distributions. The sparseness of speech and text data, however, degrades the reliability of the estimation process and makes speech recognition a challenging task. In this paper, we propose to use a nonlinear feature transformation based on the speech manifold called Intrinsic Spectral Analysis (ISA)...

متن کامل

Speech recognition for under-resourced languages: Data sharing in hidden Markov model systems

FUNDING: Fund for Scientific Research of Flanders; National Research Foundation (South Africa); South African Department of Arts and Culture: Programme of Collaboration on HLT. For purposes of automated speech recognition in under-resourced environments, techniques used to share acoustic data between closely related or similar languages become important. Donor languages with abundant resources ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012